Optimising Event Locations and Urban Management in Melbourne¶

💡
Authored by: Sahan Chamod


Duration: 90 mins
Level: Intermediate
Pre-requisite Skills: Python, Data Analysis, Pandas, Data Visualisation

Scenario¶

The City of Melbourne regularly hosts both public and private events ranging from festivals and cultural shows to weddings and workshops. However, unstructured event planning may lead to crowding, traffic issues, or poor accessibility.

This use case explores how existing pedestrian and parking data can be used to support city planners and event organisers in selecting more suitable and data-driven locations for hosting events, with the goal of improving public accessibility, reducing congestion, and enhancing overall urban management.

User Stories¶

  • As a city planner, I want to analyse footfall and parking data so I can designate zones that are best suited for public and private events in Melbourne.

  • As an event planner, I want to identify locations that match the nature of the event (e.g., small gathering vs. public concert) so I can select sites with either high public engagement or greater privacy and accessibility.

  • As an urban policy adviser, I want to understand pedestrian dynamics and parking availability so that I can recommend improvements to Melbourne’s urban planning strategies.

What This Use Case Will Teach You?¶

  • Application of clustering (KMeans) for unsupervised classification of real-world spatial data.

  • Use of geospatial analysis and buffer-based joins to integrate footfall and parking data.

  • Visualisation using interactive maps (Folium) for decision-making.

  • Scenario-based reasoning using real public datasets.

  • Application of data-driven insights to guide urban planning and event management.

Background and Introduction¶

Melbourne is one of the most vibrant cities in Australia, known for its frequent public events, cultural festivals, and private functions. With limited public space and fluctuating pedestrian activity, it becomes important to choose event locations that align with crowd patterns and access infrastructure.

This project focuses on analysing pedestrian sensor data and parking availability to identify areas best suited for public and private events. Public events benefit from areas with high visibility and foot traffic, while private events require less crowded, more accessible zones. This data-driven approach supports better planning, reduced disruption, and efficient use of urban space.

Datasets Used¶

This use case integrates four different datasets, each contributing to various aspects of spatial and event analysis:

  1. Pedestrian Sensor Location Data
    Contains metadata for fixed pedestrian sensors across Melbourne, including sensor names, installation dates, status, orientation, and geographic coordinates (latitude, longitude). This dataset was used to map and identify sensor locations spatially.

    https://data.melbourne.vic.gov.au/explore/dataset/pedestrian-counting-system-sensor-locations/information/

  2. Pedestrian Count Data
    Hourly pedestrian counts from each sensor, indexed by location_id and sensing_date. This data was aggregated to calculate average daily and hourly pedestrian footfall, which formed the basis for clustering and identifying high-traffic areas.

    https://data.melbourne.vic.gov.au/explore/dataset/pedestrian-counting-system-monthly-counts-per-hour/information/

  3. Parking Data
    Geospatial data indicating available parking locations across the city, with fields such as roadsegmentdescription, latitude, and longitude. This was used to assess parking availability within a 150-meter radius of pedestrian sensors to support the identification of private event zones.

    https://data.melbourne.vic.gov.au/explore/dataset/on-street-parking-bays/information/

  4. Event Registry Data
    A supplementary dataset listing previously held events across Melbourne, including event titles, start/end dates, categories (e.g., filming, music). This was reviewed to understand the nature of the events planned in last few years.

    https://data.melbourne.vic.gov.au/explore/dataset/event-permits-2014-2018-including-film-shoots-photo-shoots-weddings-christmas-pa/information/

Libraries¶

In [1]:
from config import API_KEY
import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO
import warnings
import folium
from folium.plugins import MarkerCluster, HeatMap
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from io import BytesIO
import base64
import geopandas as gpd

Read Data Using API¶

In [2]:
# Hide unnecessary warnings
warnings.filterwarnings('ignore')

#Function to collect data 
def collect_data(dataset_id):
    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    dataset_id = dataset_id
    format = 'csv'

    url = f'{base_url}{dataset_id}/exports/{format}'
    params = {
    'select': '*',
    'limit': -1, # all records
    'lang': 'en',
    'timezone': 'UTC',
    'api_key': API_KEY # Replace with your personal key
    }

    # GET request
    response = requests.get(url, params=params)
    if response.status_code == 200:
        # StringIO to read the CSV data
        url_content = response.content.decode('utf-8')
        dataset = pd.read_csv(StringIO(url_content), delimiter=';')
        return dataset 
    else:
        print(f'Request failed with status code {response.status_code}')

# Read data using the function
event_df = collect_data('event-permits-2014-2018-including-film-shoots-photo-shoots-weddings-christmas-pa')
pedestrian_df = collect_data('pedestrian-counting-system-monthly-counts-per-hour')
parking_df = collect_data('on-street-parking-bays')
p_sensor_loc = collect_data('pedestrian-counting-system-sensor-locations')

Data Frames¶

In [3]:
p_sensor_loc.head()
Out[3]:
location_id sensor_description sensor_name installation_date note location_type status direction_1 direction_2 latitude longitude location
0 24 Spencer St-Collins St (North) Col620_T 2013-09-02 NaN Outdoor A East West -37.818880 144.954492 -37.81887963, 144.95449198
1 25 Melbourne Convention Exhibition Centre MCEC_T 2013-08-28 NaN Outdoor A East West -37.824018 144.956044 -37.82401776, 144.95604426
2 36 Queen St (West) Que85_T 2015-01-20 Pushbox Upgrade, 03/08/2023 Outdoor A North South -37.816525 144.961211 -37.81652527, 144.96121062
3 41 Flinders La-Swanston St (West) Swa31 2017-06-29 NaN Outdoor A North South -37.816686 144.966897 -37.81668634, 144.96689733
4 44 Tin Alley-Swanston St (West) UM3_T 2015-04-15 Pushbox Upgrade, 30/06/2023 Outdoor A North South -37.796987 144.964413 -37.79698741, 144.96441306
In [4]:
event_df.head()
Out[4]:
title event_start event_end category_1 category_2 location
0 Anthony 2015-02-17 2015-02-17 Filming - Movie NaN Inner Suburb Locations
1 Spirit Of The Game 2015-08-18 2015-08-18 Filming - Movie NaN Carlton Gardens
2 Ali's Wedding 2015-11-30 2015-11-30 Filming - Movie NaN Inner Suburb Locations
3 Dogfight 2016-08-23 2016-08-23 Filming - Movie NaN Inner Suburb Locations
4 Dogfight Unit Base 2016-09-21 2016-09-21 Filming - Movie NaN Flagstaff Gardens
In [5]:
pedestrian_df.head()
Out[5]:
id location_id sensing_date hourday direction_1 direction_2 pedestriancount sensor_name location
0 39420220226 39 2022-02-26 4 6 2 8 AlfPl_T -37.81379749, 144.96995745
1 141720230914 14 2023-09-14 17 1021 230 1251 SanBri_T -37.82011242, 144.96291897
2 401120220119 40 2022-01-19 11 92 121 213 Spr201_T -37.80999341, 144.97227587
3 361120220101 36 2022-01-01 11 43 29 72 Que85_T -37.81652527, 144.96121062
4 851020211222 85 2021-12-22 10 96 74 170 488Mac_T -37.79432415, 144.92973378
In [6]:
parking_df.head()
Out[6]:
roadsegmentid kerbsideid roadsegmentdescription latitude longitude lastupdated location
0 22377 NaN The Avenue between MacArthur Road and Gatehous... -37.791116 144.957604 2023-10-31 -37.7911156, 144.9576042
1 22377 NaN The Avenue between MacArthur Road and Gatehous... -37.791013 144.957570 2023-10-31 -37.7910129, 144.9575696
2 22377 NaN The Avenue between MacArthur Road and Gatehous... -37.790961 144.957556 2023-10-31 -37.7909613, 144.9575557
3 22377 NaN The Avenue between MacArthur Road and Gatehous... -37.790806 144.957525 2023-10-31 -37.7908061, 144.9575254
4 22377 NaN The Avenue between MacArthur Road and Gatehous... -37.790435 144.957467 2023-10-31 -37.7904352, 144.9574667
In [7]:
print('Shapes of dataframes\n')
print(f'Parking df\t\t\t: {parking_df.shape}')
print(f'Event df\t\t\t: {event_df.shape}')
print(f'Pedestrian df\t\t\t: {pedestrian_df.shape}')
print(f'Parking sensor location\t\t: {p_sensor_loc.shape}')
Shapes of dataframes

Parking df			: (23864, 7)
Event df			: (2827, 6)
Pedestrian df			: (2305280, 9)
Parking sensor location		: (139, 12)

Data Cleaning¶

Event Data Frame¶

In [8]:
# Replace some values to fix spelling
event_df['category_1'] = event_df['category_1'].replace(
    {'Public Event - Run/Walk':'Public Event - Run Walk',
     'Pubilc Event - Non-ticketed': 'Public Event - Non-ticketed',
     'Public Event - Non Ticketed': 'Public Event - Non-ticketed'
     }
)
In [9]:
# Convert event_start to datetime
event_df['event_start'] = pd.to_datetime(event_df['event_start'], errors='coerce')

Pedestrian Data Frame¶

In [10]:
# Convert date column to date type
pedestrian_df['sensing_date'] = pd.to_datetime(pedestrian_df['sensing_date'], errors='coerce')

Exploratory Data Analysis¶

Overview of Event Permits (2014-2018)¶

How many events?

This section outlines the types of permitted events that took place in the Melbourne city area between 2014 and 2018. Insights drawn from this dataset will be used in the project to identify common event types and their typical time frames. This information will also support the analysis of the most suitable locations and optimal timings for hosting various events across the city.

In [11]:
print(f'Total number of events that took place in above time frame: {event_df.shape[0]}')
Total number of events that took place in above time frame: 2827

What kind of events?

In [12]:
eve_cat1_counts = event_df['category_1'].value_counts()
print(eve_cat1_counts)
category_1
Wedding                                    615
Public Event - Non-ticketed                511
Promotion                                  430
Filming - TV Series                        200
Public Event - Run Walk                    198
Filming - TVC                              158
Public Event - Ticketed                    111
Public Event - Low Impact Activity          92
Filming - Photo shoot                       90
Private Event                               87
Filming - Unit Base                         58
Filming - Student                           57
Public Event - Music Event                  41
Filming - Other                             40
Public Event - Media/Launch Event           36
Public Event - Memorial                     25
Filming - Movie                             21
Public Event - Cycling Event                19
Public Event - Parade                       16
Public Event - Music                         6
Public Event - Cycling                       3
Filming - TV Series, Filming - Unit          2
Public Event - Media Launch                  2
Public Event - Outside Broadcast             2
Filming -- Other                             1
Filming - TVC, Recreation and Sport          1
Private Event -                              1
Public Event - Media Launch Event            1
Filming - TV Series Filming - TV Series      1
Public Event                                 1
Public Event - Low Impact Activity,          1
Name: count, dtype: int64

According to the results above, the dataset contains a large number of event categories, which makes the analysis more complex than necessary. To simplify the analysis, it would be more effective to introduce a new classification that groups events under broader categories such as “Public” and “Private” events.

In [13]:
fig, ax = plt.subplots(figsize=(7, 5))  # Set fixed size for the chart

# Classify as 'Public' or 'Private'
event_df['event_type'] = event_df['category_1'].apply(
    lambda x: 'Public' if 'Public' in x or 'Promotion' in x else 'Private'
)

# Count event types
event_counts = event_df['event_type'].value_counts()
labels = event_counts.index
values = event_counts.values
xpos = np.arange(len(labels))

# Plot bars
bars = ax.bar(xpos, values, color=['teal', 'indianred'], width=0.6)

# Add labels on bars
for i, value in enumerate(values):
    ax.text(xpos[i], value + 20, value, ha='center', fontsize=10)

# Add total events text on the side
total_events = sum(values)
ax.text(
    xpos[-1] + 0.7, max(values) / 2,
    f'Total Events: {total_events}'
)

# Tidy up axes
ax.set_xticks(xpos)
ax.set_xticklabels(labels)
ax.set_ylabel('Number of Events')
ax.set_title('Public vs Private Events (2014–2018)')
ax.set_xlim(-0.5, xpos[-1] + 1.5)  # Prevent auto-resize due to side text

plt.tight_layout()
plt.show()
No description has been provided for this image

The bar chart shows that from 2014 to 2018, Melbourne hosted 2827 events in total including slightly more Public events (1,495) than Private events (1,332). This indicates a fairly balanced distribution, with public events being just a bit more common.

In [14]:
# Filter data for Public and Private events separately
public_events = event_df[event_df['event_type'] == 'Public']
private_events = event_df[event_df['event_type'] == 'Private']

# Count the number of events by category1
public_counts = public_events['category_1'].value_counts()
private_counts = private_events['category_1'].value_counts()

# Plot horizontal bar charts
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(14, 6))

# Public events plot
axes[0].barh(public_counts.index, public_counts.values, color='teal')
axes[0].set_title('Public Events by Category1 (2014-2018)')
axes[0].set_xlabel('Number of Events')
axes[0].invert_yaxis()

# Private events plot
axes[1].barh(private_counts.index, private_counts.values, color='indianred')
axes[1].set_title('Private Events by Category1 (2014-2018)')
axes[1].set_xlabel('Number of Events')
axes[1].invert_yaxis()

plt.tight_layout()
plt.show()
No description has been provided for this image

Public vs Private Events by Category (2014–2018)

  • Public events were primarily:

    • Non-ticketed public gatherings
    • Promotional activities
    • Run/walk events
  • Private events were largely:

    • Weddings
    • Filming activities, including TV series, commercials (TVC), and photo shoots

This distribution highlights that public spaces in Melbourne were frequently used for community and promotional purposes, whereas private events were focused on personal celebrations and media production.

In [15]:
event_df.head()
Out[15]:
title event_start event_end category_1 category_2 location event_type
0 Anthony 2015-02-17 2015-02-17 Filming - Movie NaN Inner Suburb Locations Private
1 Spirit Of The Game 2015-08-18 2015-08-18 Filming - Movie NaN Carlton Gardens Private
2 Ali's Wedding 2015-11-30 2015-11-30 Filming - Movie NaN Inner Suburb Locations Private
3 Dogfight 2016-08-23 2016-08-23 Filming - Movie NaN Inner Suburb Locations Private
4 Dogfight Unit Base 2016-09-21 2016-09-21 Filming - Movie NaN Flagstaff Gardens Private
In [16]:
# Define subcategory logic
def classify_event(row):
    if row['event_type'] == 'Private':
        if 'Wedding' in row['category_1']:
            return 'Wedding'
        elif 'Filming' in row['category_1']:
            return 'Filming'
        else:
            return 'Other'
    elif row['event_type'] == 'Public':
        if 'Non-ticketed' in row['category_1']:
            return 'Non-ticketed'
        elif 'Promotion' in row['category_1']:
            return 'Promotion'
        else:
            return 'Other'
    return 'Unknown'

# Apply subcategory classification
event_df['sub_category'] = event_df.apply(classify_event, axis=1)
In [17]:
# Define season mapping function
def get_season(month):
    if month in [12, 1, 2]:
        return 'Summer'
    elif month in [3, 4, 5]:
        return 'Autumn'
    elif month in [6, 7, 8]:
        return 'Winter'
    else:
        return 'Spring'

# Add year, month, season, and sort helper columns
season_to_month = {'Summer': '01', 'Autumn': '04', 'Winter': '07', 'Spring': '10'}

event_df['month'] = event_df['event_start'].dt.month
event_df['season'] = event_df['month'].apply(get_season)
event_df['year'] = event_df['event_start'].dt.year
event_df['season_month'] = event_df['season'].map(season_to_month)
event_df['season_sort_label'] = pd.to_datetime(event_df['year'].astype(str) + '-' + event_df['season_month'])
event_df['year_season_label'] = event_df['year'].astype(str) + '-' + event_df['season']

# Group data with season ordering
label_grouped = event_df.groupby(
    ['year_season_label', 'season_sort_label', 'event_type', 'sub_category']
).size().reset_index(name='count')

label_grouped = label_grouped.sort_values('season_sort_label')

# Plot for Public Events
plt.figure(figsize=(14, 5))
sns.lineplot(
    data=label_grouped[label_grouped['event_type'] == 'Public'],
    x='year_season_label', y='count', hue='sub_category', marker='o'
)
plt.xticks(rotation=45)
plt.title('Seasonal Trends of Public Events (Year–Season)')
plt.ylabel('Number of Events')
plt.xlabel('Season')
plt.tight_layout()
plt.show()
No description has been provided for this image
  • A clear seasonal pattern is evident in public events between 2014 and 2018, with noticeable peaks during spring and reduced activity in winter.

This trend highlights a strong preference for organizing public events in warmer, more favorable weather conditions, particularly during spring.

In [18]:
# Plot for Private Events
plt.figure(figsize=(14, 5))
sns.lineplot(
    data=label_grouped[label_grouped['event_type'] == 'Private'],
    x='year_season_label', y='count', hue='sub_category', marker='o'
)
plt.xticks(rotation=45)
plt.title('Seasonal Trends of Private Events (Year–Season)')
plt.ylabel('Number of Events')
plt.xlabel('Season')
plt.tight_layout()
plt.show()
No description has been provided for this image
  • Private events from 2014 to 2018 also exhibit a distinct seasonal pattern, with weddings showing strong peaks in spring and summer and dropping significantly in winter. Filming events, in contrast, appear more evenly distributed throughout the year, indicating they are less influenced by seasonal conditions.

This contrast highlights how weather and outdoor suitability primarily impact personal celebrations like weddings, while media productions maintain steady demand year-round.


Overview of Pedestrian Data Frame¶

In [19]:
# Keep a copy of df as it takes considerable time to reload from API
raw_pedestrian_df = pedestrian_df.copy()
In [20]:
# Merge on 'location_id'
pedestrian_df = pedestrian_df.merge(
    p_sensor_loc[['location_id', 'sensor_description']],
    on='location_id',
    how='left'
)

# Filter to include only records up to 2025-04-15 
pedestrian_df = pedestrian_df[pedestrian_df['sensing_date'] <= '2025-04-15']
In [21]:
# Dataset Overview
print(
    f"This dataset contains {pedestrian_df.shape[0]:,} hourly pedestrian records collected from "
    f"{pedestrian_df['location_id'].nunique()} unique locations using {pedestrian_df['sensor_name'].nunique()} sensors. "
)
print(f"The data spans from {pedestrian_df['sensing_date'].min().strftime('%Y-%m-%d')} to {pedestrian_df['sensing_date'].max().strftime('%Y-%m-%d')}, "
    f"covering all 24 hours of the day.")
This dataset contains 2,355,801 hourly pedestrian records collected from 98 unique locations using 96 sensors. 
The data spans from 2021-07-01 to 2025-04-15, covering all 24 hours of the day.
In [22]:
plt.figure(figsize=(10, 6))
sns.boxplot(x=pedestrian_df['pedestriancount'], color='teal')
plt.title("Boxplot of Pedestrian Count")
plt.xlabel("Pedestrian Count")
plt.grid(True)
plt.show()
No description has been provided for this image

This boxplot shows that most locations and times have low pedestrian counts, usually under 500.

But there are also many higher values, going up to nearly 10,000. They likely come from busy areas like the city center or places with special events. So, the data clearly shows that foot traffic can change a lot depending on the location and time.

In [23]:
plt.figure(figsize=(10, 6))
sns.histplot(pedestrian_df['pedestriancount'], bins=100, kde=False, color = 'indianred')
plt.title("Histogram of Pedestrian Count")
plt.xlabel("Pedestrian Count")
plt.ylabel("Number of Records")
plt.show()
No description has been provided for this image

The histogram shows that most pedestrian counts are low, with a large number of records having lower counts. This is expected in a city environment, where only a few places like CBDs or event areas experience very high foot traffic.

In [24]:
avg_by_hour = pedestrian_df.groupby('hourday')['pedestriancount'].mean().reset_index()

plt.figure(figsize=(10, 6))
plt.plot(avg_by_hour['hourday'], avg_by_hour['pedestriancount'], 
         marker='o', color='teal')

plt.title("Average Pedestrian Count by Hour")
plt.xlabel("Hour of Day")
plt.ylabel("Average Pedestrian Count")
plt.xticks(range(0, 24))
plt.show()
No description has been provided for this image

The line chart illustrates the average pedestrian count in Melbourne throughout a typical day, based on data aggregated from multiple sensor locations.

The data shows a clear pattern of pedestrian movement across 24 hours:

  • Lowest activity occurs between midnight and 5 AM.
  • A steady increase begins around 6 AM, aligning with morning commute hours.
  • The first major peak appears between 8 AM and 9 AM, indicating typical work or school start times.
  • After a slight midday dip, pedestrian activity peaks again between 1 PM and 3 PM, possibly due to lunch breaks and city movement.
  • From 4 PM onward, there's a gradual decline, reflecting the end of the workday and evening transitions.

This pattern provides insight into when foot traffic is at its highest and lowest across the city, which is valuable for event timing, resource planning, and urban management.

In [25]:
# Group by location and calculate average pedestrian count
busy_locations = pedestrian_df.groupby(['location_id', 'sensor_description'])['pedestriancount'].mean().reset_index()

# Sort and get top 20
top_10 = busy_locations.sort_values(by='pedestriancount', ascending=False).head(10)

# Plot
plt.figure(figsize=(12, 4))
sns.barplot(data=top_10, y='sensor_description', x='pedestriancount', color='lightseagreen')
plt.title("Top 10 Busiest Locations by Average Pedestrian Count")
plt.xlabel("Average Pedestrian Count")
plt.ylabel("Sensor Location (Road)")
plt.tight_layout()
plt.show()
No description has been provided for this image

This chart shows the top 10 busiest locations based on average pedestrian count. Flinders La–Swanston St (West), Elizabeth Street, and Southbank are among the most active areas, making the areas around them suitable for large public events or campaigns.

In [26]:
# Extract latitude and longitude
pedestrian_df[['lat', 'lon']] = pedestrian_df['location'].str.split(', ', expand=True).astype(float)

# Aggregate average pedestrian count per location
agg_df = pedestrian_df.groupby(
    ['location_id', 'sensor_description', 'lat', 'lon'],
    as_index=False
)['pedestriancount'].mean()
agg_df.rename(columns={'pedestriancount': 'avg_count'}, inplace=True)

# Scale avg_count and apply KMeans clustering
scaler = StandardScaler()
scaled = scaler.fit_transform(agg_df[['avg_count']])
kmeans = KMeans(n_clusters=4, random_state=1)
agg_df['cluster'] = kmeans.fit_predict(scaled)

# Order clusters based on avg pedestrian count
cluster_order = agg_df.groupby('cluster')['avg_count'].mean().sort_values().index.tolist()
ordered_colors = ['green', 'gold', 'darkorange', 'indianred']
color_map = {cluster: ordered_colors[i] for i, cluster in enumerate(cluster_order)}

# Function to generate embedded line chart in popup
def generate_popup_html(sensor_name, avg_count, hourly_data):
    fig, ax = plt.subplots(figsize=(3, 2))
    ax.plot(hourly_data.index, hourly_data.values, linewidth=1)
    ax.set_title('Hourly Avg')
    ax.set_xlabel('Hour')
    ax.set_ylabel('Count')
    plt.tight_layout()

    buffer = BytesIO()
    plt.savefig(buffer, format='png')
    buffer.seek(0)
    img_base64 = base64.b64encode(buffer.read()).decode()
    plt.close()

    html = f"""
    <h4>{sensor_name}</h4>
    <p>Avg Count: {int(avg_count)}</p>
    <img src='data:image/png;base64,{img_base64}' width='220'/>
    """
    return html

# Create map
mel_map = folium.Map(location=[-37.81, 144.96], zoom_start=14, tiles='CartoDB positron')

legend_html = '''
 <div style="
     position: fixed; 
     bottom: 50px; left: 50px; width: 180px; height: 150px; 
     background-color: white; z-index:9999; font-size:14px;
     border:2px solid grey; border-radius:8px; padding: 10px;
     box-shadow: 2px 2px 5px rgba(0,0,0,0.3);">
     <b>Average Pedestrian Count</b><br>
     <i style="background:green; width:10px; height:10px; float:left; margin-right:8px; opacity:0.9;"></i> Very Low<br>
     <i style="background:gold; width:10px; height:10px; float:left; margin-right:8px; opacity:0.9;"></i> Low<br>
     <i style="background:darkorange; width:10px; height:10px; float:left; margin-right:8px; opacity:0.9;"></i> Moderate<br>
     <i style="background:indianred; width:10px; height:10px; float:left; margin-right:8px; opacity:0.9;"></i> High<br>
 </div>
'''

mel_map.get_root().html.add_child(folium.Element(legend_html))

# Add each marker with chart popup and color by cluster
for _, row in agg_df.iterrows():
    hourly_data = pedestrian_df[
        pedestrian_df['sensor_description'] == row['sensor_description']
    ].groupby('hourday')['pedestriancount'].mean()

    popup_html = generate_popup_html(row['sensor_description'], row['avg_count'], hourly_data)
    folium.CircleMarker(
        location=[row['lat'], row['lon']],
        radius=7,
        color=color_map[row['cluster']],
        fill=True,
        fill_opacity=0.8,
        popup=folium.Popup(popup_html, max_width=300)
    ).add_to(mel_map)

mel_map
Out[26]:
Make this Notebook Trusted to load map: File -> Trust Notebook

The map displays pedestrian sensor locations across the city of Melbourne, clustered based on their average pedestrian count. Each point on the map is color-coded to indicate the level of pedestrian activity:

  • Red: High average footfall
  • Orange: Moderate footfall
  • Yellow: Low footfall
  • Green: Very low footfall

The clustering approach helps identify areas of high and low pedestrian activity across the city.

High and moderate footfall locations are mostly concentrated around central Melbourne, indicating their suitability for public events. In contrast, low and very low footfall areas are found around the outer zones, suggesting their potential for private events, especially when supported by nearby parking availability.

This is an interactive map, allowing users to click on each location to view additional information such as the sensor name, average daily count, and a line chart showing the hourly pedestrian trend. This feature provides deeper insights and supports more informed event planning decisions.

Overview of Parking Data Frame¶

In [27]:
parking_df = collect_data('on-street-parking-bays')
In [28]:
total_parking_spots = parking_df[['latitude', 'longitude']].drop_duplicates().shape[0]
print(f"Total unique parking spots: {total_parking_spots}")
Total unique parking spots: 23864
In [29]:
# Create a base map centered around Melbourne
parking_map = folium.Map(location=[-37.81, 144.96], zoom_start=14, tiles='CartoDB positron')

# Plot each unique parking spot
for _, row in parking_df.drop_duplicates(subset=['latitude', 'longitude']).iterrows():
    folium.CircleMarker(
        location=[row['latitude'], row['longitude']],
        radius=2,  # small for better visibility with large data
        color='steelblue',
        fill=True,
        fill_opacity=0.6
    ).add_to(parking_map)

# Add custom HTML title to the map
title_html = '''
     <h3 align="center" style="font-size:20px">Parking Spots</h3>
'''
parking_map.get_root().html.add_child(folium.Element(title_html))

# Display the map
parking_map
Out[29]:
Make this Notebook Trusted to load map: File -> Trust Notebook

This map displays the distribution of available parking spots across central Melbourne. Each blue segment represents a recorded parking location, providing a clear view of street-level and structured parking coverage.

The visualisation shows that parking availability is densest around the central and northern sections of the Melbourne CBD and surrounding commercial zones. Areas near Spencer Street, Queen Victoria Market, and Carlton exhibit a high density of parking facilities, which suggests these regions are better equipped to support events that require vehicle access.

This map was used primarily to support the identification of zones suitable for private events. In such cases, adequate parking is a key consideration to ensure convenient access for attendees, vendors, and service providers.

By combining this data with pedestrian footfall clusters, the analysis highlights locations that are quiet yet logistically accessible, making them ideal for private events.

In [30]:
# Group by lat/lon to count parking availability per coordinate
parking_agg = parking_df.groupby(['latitude', 'longitude']).size().reset_index(name='count')

# Prepare heatmap data
heat_data = [[row['latitude'], row['longitude'], row['count']] for _, row in parking_agg.iterrows()]

# Normalize weights
max_val = max(pt[2] for pt in heat_data)
heat_data = [[lat, lon, count / max_val] for lat, lon, count in heat_data]

# Create the base map
parking_map = folium.Map(location=[-37.81, 144.96], zoom_start=14, tiles='CartoDB positron')

# Add heatmap layer
HeatMap(
    data=heat_data,
    radius=10,
    blur=12,
    min_opacity=0.05,
    max_zoom=14
).add_to(parking_map)

# Add custom HTML title to the map
title_html = '''
     <h3 align="center" style="font-size:20px">Parking Availability Heatmap</h3>
'''
parking_map.get_root().html.add_child(folium.Element(title_html))

# Show the map
parking_map
Out[30]:
Make this Notebook Trusted to load map: File -> Trust Notebook

This heatmap shows areas with high parking availability in red and yellow. Blue areas have moderate availability, and areas with no color have little or no parking data. It’s a clear view of where parking is most concentrated in Melbourne.

Recognize Suitable Locations for Events¶

Zones Suitable for Public Events¶

In [31]:
# Identify moderate and high clusters
target_clusters = [
    cluster for cluster, color in color_map.items() 
    if color in ['darkorange', 'indianred']
]

# Filter relevant locations
public_event_locs = agg_df[agg_df['cluster'].isin(target_clusters)]

# Normalize data for heatmap
heat_data = [
    [row['lat'], row['lon'], row['avg_count']] 
    for _, row in public_event_locs.iterrows()
]
max_val = max(pt[2] for pt in heat_data)
heat_data = [[lat, lon, count / max_val] for lat, lon, count in heat_data]

# Create a clean base map (no markers, no legend)
public_event_map = folium.Map(location=[-37.81, 144.96], zoom_start=14, tiles='CartoDB positron')

# Add heatmap only
HeatMap(
    data=heat_data,
    radius=25,
    blur=20,
    min_opacity=0.3,
    max_zoom=14
).add_to(public_event_map)

# Add a title
title_html = '''
     <h3 align="center" style="font-size:20px">Zones Suitable for Public Events</h3>
'''
public_event_map.get_root().html.add_child(folium.Element(title_html))

public_event_map
Out[31]:
Make this Notebook Trusted to load map: File -> Trust Notebook

This heatmap displays areas in Melbourne identified as suitable for hosting public events. The highlighted zones are determined based on pedestrian activity levels, using clustering on average footfall data.

The most prominent zones shown in red and yellow indicate areas with the highest pedestrian counts. These are primarily located in and around the Melbourne Central Business District (CBD), which includes key public spaces, shopping districts, and major roads.

Such areas are ideal for public events like festivals, promotional campaigns, or community gatherings, as they offer high visibility and natural pedestrian engagement throughout the day.

The heatmap provides a spatial reference for selecting public event locations that are likely to attract attention and benefit from existing crowd movement, without the need for additional footfall generation.

Note: When identifying suitable zones for public events, parking availability was not considered as a filtering criterion. Public events typically attract large crowds where individual parking cannot be guaranteed or practically accommodated. Instead, pedestrian footfall was prioritised, as public events benefit from natural crowd flow and high visibility. Additionally, most attendees are expected to rely on public transport, walking, or other shared mobility options when attending such events, especially in central areas of Melbourne.

Zones Suitable for Private Events¶

In [32]:
# Convert pedestrian to GeoDataFrame (buffered for proximity join)
ped_gdf = gpd.GeoDataFrame(
    agg_df, 
    geometry=gpd.points_from_xy(agg_df['lon'], agg_df['lat']),
    crs="EPSG:4326"
).to_crs(epsg=3857)

# Buffer each pedestrian sensor by 150 meters
ped_gdf['geometry'] = ped_gdf.geometry.buffer(150)

# Convert parking to GeoDataFrame
parking_gdf = gpd.GeoDataFrame(
    parking_df,
    geometry=gpd.points_from_xy(parking_df['longitude'], parking_df['latitude']),
    crs="EPSG:4326"
).to_crs(epsg=3857)
In [33]:
# Spatial join: which parking points fall inside pedestrian sensor buffers
joined = gpd.sjoin(parking_gdf, ped_gdf, how='inner', predicate='within')

# Count parking slots near each pedestrian location
parking_counts = joined.groupby('location_id')['location'].count().reset_index(name='parking_count')
In [34]:
# Merge back into pedestrian data
private_df = agg_df.merge(parking_counts, on='location_id', how='left')
private_df['parking_count'].fillna(0, inplace=True)

# Criteria: low pedestrian footfall + medium/high parking
low_clusters = [
    cluster for cluster, color in color_map.items()
    if color in ['green', 'gold']
]
parking_threshold = private_df['parking_count'].median()

private_event_locs = private_df[
    (private_df['cluster'].isin(low_clusters)) &
    (private_df['parking_count'] >= parking_threshold)
]
In [35]:
# Prepare heatmap data
heat_data = [
    [row['lat'], row['lon'], row['parking_count']] 
    for _, row in private_event_locs.iterrows()
]
max_val = max(pt[2] for pt in heat_data)
heat_data = [[lat, lon, count / max_val] for lat, lon, count in heat_data]

# Create clean map
private_event_map = folium.Map(location=[-37.81, 144.96], zoom_start=14, tiles='CartoDB positron')

# Add heatmap
HeatMap(
    data=heat_data,
    radius=25,
    blur=20,
    min_opacity=0.3,
    max_zoom=14
).add_to(private_event_map)

# Add simple title
title_html = '''
     <h3 align="center" style="font-size:20px">Zones Suitable for Private Events</h3>
'''
private_event_map.get_root().html.add_child(folium.Element(title_html))

private_event_map
Out[35]:
Make this Notebook Trusted to load map: File -> Trust Notebook

This heatmap illustrates zones across Melbourne that are considered suitable for hosting private events. The suitability is determined based on two key criteria:

  1. Low or very low pedestrian footfall – to ensure privacy and minimal disruption.
  2. Medium to high nearby parking availability – to support easy access for attendees.

Brighter and more concentrated areas on the map indicate stronger suitability, where both conditions are met effectively.

The map highlights several clusters in quieter parts of the city, often away from the central business district. These areas are optimal for events such as weddings, workshops, or community meetings where a calm environment and accessibility are important.

This visualisation supports event planners in selecting locations where crowd levels are naturally lower but logistical support, like parking, is still available.

Conclusion¶

This use case demonstrates how urban data can be used to improve decision-making in event location planning. By analysing pedestrian patterns and parking availability, city planners and event organisers can make informed choices about where to host events based on foot traffic intensity, accessibility, and logistical feasibility.

The project distinguishes between public and private event needs—focusing on visibility and crowd flow for public events, and privacy with parking access for private ones. It also highlights the value of spatial data integration, clustering techniques, and interactive visualisations in supporting location-based decision-making.

The resulting recommendations help minimise disruption, improve attendee experience, and support Melbourne's broader urban management and planning goals. This approach provides a scalable framework that can be extended or replicated in other cities where similar data infrastructure exists, contributing to smarter, data-informed urban event strategies.

Recommendations¶

Event planners can use the interactive heatmaps generated from this analysis to make informed decisions:

  • Use the "Zones Suitable for Public Events" heatmap to identify high-footfall areas ideal for visibility and crowd engagement.

  • Use the "Zones Suitable for Private Events" heatmap to locate quieter zones with better parking access, ensuring minimal disruption and convenience for attendees.

These visual tools support quick, evidence-based location selection based on event type and urban dynamics.